Using Bitmap Indexing Technology for Combined Numerical and Text Queries
نویسندگان
چکیده
In this paper, we describe a strategy of using compressed bitmap indices to speed up queries on both numerical data and text documents. By using an efficient compression algorithm, these compressed bitmap indices are compact even for indices with millions of distinct terms. Moreover, bitmap indices can be used very efficiently to answer Boolean queries over text documents involving multiple query terms. Existing inverted indices for text searches are usually inefficient for corpora with a very large number of terms as well as for queries involving a large number of hits. We demonstrate that our compressed bitmap index technology overcomes both of those short-comings. In a performance comparison against a commonly used database system, our indices answer queries 30 times faster on average. To provide full SQL support, we integrated our indexing software, called FastBit, with MonetDB. The integrated system MonetDB/FastBit provides not only efficient searches on a single table as FastBit does, but also answers join queries efficiently. Furthermore, MonetDB/FastBit also provides a very efficient retrieval mechanism of result records.
منابع مشابه
FastBit: An Efficient Indexing Technology For Accelerating Data-Intensive Science
FastBit is a software tool for searching large read-only datasets. It organizes user data in a column-oriented structure which is efficient for on-line analytical processing (OLAP), and utilizes compressed bitmap indices to further speed up query processing. Analyses have proven the compressed bitmap index used in FastBit to be theoretically optimal for onedimensional queries. Compared with oth...
متن کاملBitmap Indices for Fast End-User Physics Analysis in ROOT
Most physics analysis jobs involve multiple selection steps on the input data. These selection steps are called cuts or queries. A common strategy to implement these queries is to read all input data from files and then process the queries in memory. In many applications the number of variables used to define these queries is a relative small portion of the overall data set therefore reading al...
متن کاملUsing Bitmap Index for Joint Queries on Structured and Text Data
The database and the information retrieval communities have been working on separate sets of techniques for querying structured data and text data, but there is a growing need to handle these types of data together. In this paper, we present a strategy to efficiently answer joint queries on both types of data. By using an efficient compression algorithm, our compressed bitmap indexes, called Fa...
متن کاملBitmap Index Partition Techniques for Continuous and High Cardinality Discrete Attributes
Bitmap indexing is a technique to index data. The main advantage of bitmap indexing is that boolean operations on bitmaps are very fast. This is essential for queries in OLAP applications. Typically, bitmap indexing is used for low cardinality attributes since the overall space requirement depends on the cardinality. For high cardinality attributes, a technique of associating a range of contigu...
متن کاملHierarchical Bitmap Index: An Efficient and Scalable Indexing Technique for Set-Valued Attributes
Set-valued attributes are convenient to model complex objects occurring in the real world. Currently available database systems support the storage of set-valued attributes in relational tables but contain no primitives to query them efficiently. Queries involving set-valued attributes either perform full scans of the source data or make multiple passes over single-value indexes to reduce the n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006